7 research outputs found

    Improving biocuration of microRNAs in diseases: a case study in idiopathic pulmonary fibrosis

    Get PDF
    MicroRNAs (miRNAs) are small and non-coding RNA molecules that inhibit gene expression posttranscriptionally. They play important roles in several biological processes, and in recent years there has been an interest in studying how they are related to the pathogenesis of diseases. Although there are already some databases that contain information for miRNAs and their relation with illnesses, their curation represents a significant challenge due to the amount of information that is being generated every day. In particular, respiratory diseases are poorly documented in databases, despite the fact that they are of increasing concern regarding morbidity, mortality and economic impacts. In this work, we present the results that we obtained in the BioCreative Interactive Track (IAT), using a semiautomatic approach for improving biocuration of miRNAs related to diseases. Our procedures will be useful to complement databases that contain this type of information. We adapted the OntoGene text mining pipeline and the ODIN curation system in a full-text corpus of scientific publications concerning one specific respiratory disease: idiopathic pulmonary fibrosis, the most common and aggressive of the idiopathic interstitial cases of pneumonia. We curated 823 miRNA text snippets and found a total of 246 miRNAs related to this disease based on our semiautomatic approach with the system OntoGene/ODIN. The biocuration throughput improved by a factor of 12 compared with traditional manual biocuration. A significant advantage of our semiautomatic pipeline is that it can be applied to obtain the miRNAs of all the respiratory diseases and offers the possibility to be used for other illnesses. Database URL:http://odin.ccg.unam.mx/ODIN/bc2015-miRNA

    Lisen&Curate: a platform to facilitate gathering textual evidence for curation of regulation of transcription initiation in bacteria

    No full text
    The number of published papers in biomedical research makes it rather impossible for a researcher to keep up to date. This is where manually curated databases contribute facilitating the access to knowledge. However, the structure required by databases strongly limits the type of valuable information that can be incorporated. Here, we present Lisen&Curate, a curation system that facilitates linking sentences or part of sentences (both considered sources) in articles with their corresponding curated objects, so that rich additional information of these objects is easily available to users. These sources are going to be offered both within RegulonDB and a new database, L-Regulon. To show the relevance of our work, two senior curators performed a curation of 31 articles on the regulation of transcription initiation of E. coli using Lisen&Curate. As a result, 194 objects were curated and 781 sources were recorded. We also found that these sources are useful to develop automatic approaches to detect objects in articles by observing word frequency patterns and by carrying out an open information extraction task. Sources may help to elaborate a controlled vocabulary of experimental methods. Finally, we discuss our ecosystem of interconnected applications, RegulonDB, L-Regulon, and Lisen&Curate, to facilitate the access to knowledge on regulation of transcription initiation in bacteria. We see our proposal as the starting point to change the way experimentalists connect a piece of knowledge with its evidence using RegulonDB.This study was supported by the Universidad Nacional AutĂłnoma de MĂ©xico (UNAM) and the National Institute of General Medical Sciences of the National Institutes of Health [grants number 5RO1-GM110597-04 and 1RO1-GM131643-01A1

    RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond

    No full text
    International audienceRegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation, as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for `neighborhood' genes to known operons and regulons, and computational developments
    corecore